-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zamba2 #34517
base: main
Are you sure you want to change the base?
Add Zamba2 #34517
Conversation
Rebase zamba2
Hey @Arthur, Thank you again for your help in getting Zamba2 into A few remarks, mostly related to
I carefully compared
Looking forward to your feedback. Thanks so much! |
rebase on upstream
Hey @pglorio, so sorry for the delay!! I think the first error you encounter comes from a wrong EDIT: but actually, the only failing tests I see are Zamba2 tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Here is a first round of reviews, you can check out the comments! For the lora layers, maybe rename them residual
instead of lora
to make it explicit that it is part of the original model?
You should also make sure that all the tests you added pass, this is not the case actually!
Thank you @Cyrilvallez for the super helpful feedback! Please see above the replies to each comment. It seems zamba2-related tests all pass locally for us, but not in the circle/ci setting of the PR. The error is always
for all tests that fail. We investigated the discrepancy by examinating more closely the test Also, we are happy to rename lora layers as you suggested here and agree it would clarify the role of this module. Would Thank you so much again! |
We renamed lora -> adapter here and, following your suggestion in this comment we simplified the adapter using We re-ran the model tests and they all pass locally, but it looks like they still fail in this PR environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I pointed out some minor issues, but this is almost ready! 🤗 However, you might want to consider switching the attention implementation to our newest standard here #35235 (see e.g. the llama model as an example). This is not strictly needed, but should be quite straightforward from the examples, and will help both lisibility and maintenance in the long run. And as this change was introduced to make transformers compatible with high-performance inference frameworks such as vLLM and TGI, it would potentially make Zamba2 compatible as well (not sure how we will support stateful models, and if the shapes in the Mixers are compatible, but cannot hurt!) in the future!
Thank you so much for this feedback @Cyrilvallez. I realized that the issue with unit tests was inside the By the way, we originally took the the config = Mamba2Config(num_heads=8,
n_groups=8,
state_size=2,
head_dim=8,
conv_kernel=4,
chunk_size=8,
vocab_size=99,
hidden_size=32,
num_hidden_layers=4,
hidden_act="silu",
hidden_dropout_prob=0.1,
max_position_embeddings=512,
)
model = Mamba2ForCausalLM(config)
inputs = {'input_ids': torch.tensor([[86, 6, 51, 3, 12, 15, 33, 18, 4, 92],
[69, 66, 49, 45, 48, 44, 61, 56, 68, 85]]),
'attention_mask': torch.tensor([[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 1, 1, 1, 0, 1, 1]])}
outputs_cpu = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
model = model.to('cuda')
inputs = {key: tensor.to(device=model.device) for key, tensor in inputs.items()}
outputs_cuda = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print(torch.all(outputs_cpu == outputs_cuda.cpu()).item()) returns |
What does this PR do?
Please include support for Zamba2 architecture created by Zyphra Technologies.
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker